Audio–Visual Speaker Detection using Dynamic Bayesian Networks
ثبت نشده
چکیده
The development of human-computer interfaces poses a challenging problem: actions and intentions of different users have to be inferred from sequences of noisy and ambiguous sensory data. Temporal fusion of multiple sensors can be efficiently formulated using dynamic Bayesian networks (DBNs). DBN framework allows the power of statistical inference and learning to be combined with contextual knowledge of the problem. We demonstrate the use of DBNs in tackling the problem of audio/visual speaker detection. ”Off-the-shelf” visual and audio sensors (face, skin, texture, mouth motion, and silence detectors) are optimally fused along with contextual information in a DBN architecture that infers instances when an individual is speaking. Results obtained in the setup of an actual human-machine interaction system (Genie Casino Kiosk) demonstrate superiority of our approach over that of static, context-free fusion architecture.
منابع مشابه
Multimodal Speaker Detection using Error Feedback Dynamic Bayesian Networks
Design and development of novel human-computer interfaces poses a challenging problem: actions and intentions of users have to be inferred from sequences of noisy and ambiguous multi-sensory data such as video and sound. Temporal fusion of multiple sensors has been efficiently formulated using dynamic Bayesian networks (DBNs) which allow the power of statistical inference and learning to be com...
متن کاملMultimodal Speaker Detection Using Error Feedback Dynamic Bayesian Networks
Design and development of novel human-computer interfaces poses a challenging problem: actions and intentions of users have to be inferred from sequences of noisy and ambiguous multi-sensory data such as video and sound. Temporal fusion of multiple sensors has been efficiently formulated using dynamic Bayesian networks (DBNs) which allow the power of statistical inference and learning to be com...
متن کاملAudio-Visual Speaker Detection Using Dynamic Bayesian Networks
The development of human-computer interfaces poses a challenging problem: actions and intentions of different users have to be inferred from sequences of noisy and ambiguous sensory data. Temporal fusion of multiple sensors can be efficiently formulated using dynamic Bayesian networks (DBNs). DBN framework allows the power of statistical inference and learning to be combined with contextual kno...
متن کاملMultimodal Speaker Detection Using Input/Output Dynamic Bayesian Networks
Inferring users’ actions and intentions forms an integral part of design and development of any human-computer interface. The presence of noisy and at times ambiguous sensory data makes this problem challenging. We formulate a framework for temporal fusion of multiple sensors using input–output dynamic Bayesian networks (IODBNs). We find that contextual information about the state of the comput...
متن کاملBoosting and Structure Learning in Dynamic Bayesian Networks for Audio-Visual Speaker Detection
Bayesian networks are an attractive modeling tool for human sensing, as they combine an intuitive graphical representation with ef£cient algorithms for inference and learning. Earlier work has demonstrated that boosted parameter learning could be used to improve the performance of Bayesian network classi£ers for complex multi-modal inference problems such as speaker detection. In speaker detect...
متن کامل